My project this semester aimed to analyze the microbial community of the chordate Polycarpa Aurata in order to create a phylogenetic tree of the bacteria present in this organism. Illumina sequencing was used to gather the genetic information of the samples, and DADA2 and phyloseq were used to analyze the reads.
There were 176 samples collected in 11 different locations with the goal of learning more about their microbial communities. These organisms are native to the tropical eastern Indian Ocean and the western Pacific Ocean. This particular batch of samples was collected from the Philippines/Indonesia area. Each dot on the map represents a different location where samples of the Polycarpa Aurata were collected.
To clean the data, I followed the workflow given by Dr. Zahn in his paper Marker Genes (16S and ITS) Protocol for Plant Microbiome Analyses.
The first step in the process is removing the primers off the sequences. These are artificially added and need to be removed to properly analyze the sequences. In these samples, the forward primer to be removed was “GTGCCAGCMGCCGCGGTAA” and the reverse primer was “GGACTACHVGGGTWTCTAAT.” Once removed, I could move on to quality filtration.
As you can see from the quality profiles, the ends of reads aren’t of
the best quality, so they must be trimmed. The forward reads of DNA are
usually of better quality than those of reverse reads, so this must also
be taken into account when trimming ends. With these samples, I decided
to trim the forward reads at 250 and the reverse reads at 160 to ensure
that the quality score was above 20 (99.9% accuracy).
I then moved onto the ASV inference part of the workflow. ASV’s are
amplicon sequence variants, which are sequences of DNA that are known to
belong to a certain organism. Before you can infer ASV sequences, an
error model must be made to correct the mistakes made during sequencing.
This next figure shows the error model that was made.
Once this was done I moved onto the building of the phylogenetic trees. I assigned taxonomy using the Silva version 138.1 Data Base and formatted the trees using ggtree.
The end product is the phyolegentic tree of the bacteria present in the Polycarpa Aurata. This tree is showing us the phylogeny down to class.
The longer the lines are the more abundant that particular phylum
was in the samples.
This next tree shows phylogeny down to family of the bacteria present in the samples.
We want to find out if the microbial diversity of the Polycarpa Aurata changed with location. The meta data we have includes GPS coordinates and the goal is to find out if there are any real differences in diversity at these different locations.